Non-negative Matrix Factorization
to identify Latent Factors
Underlying Psychopathology

Laura Sità

Problem

Classification of psychopathology

Traditional taxonomies: categorical diagnostic systems

(e.g., DSM, ICD)

Recent models: dimensional and transdiagnostic approaches

(e.g. Hierarchical Taxonomy of Psychopathology model by Kotov et al., 2017)

Research Question

How can we uncover latent factors across disorders?

  • Factor Analysis
  • Proposed approach (inspired by Landy et al., 2025): Non-negative Matrix Factorization (NMF)

NMF

NMF factorizes the full observed data matrix

\(M_{kg} \in \mathbb{R}_{\ge 0}^{K \times G}\),
where each row \(k\) represents an observed feature (e.g., test item)
and each column \(g\) represents an individual.

It decomposes \(M\) into two lower-rank nonnegative matrices:

\(P \in \mathbb{R}_{\ge 0}^{K \times N}\) → loadings of observed variables on \(N\) latent factors

\(E \in \mathbb{R}_{\ge 0}^{N \times G}\) → expression (or weights) of latent factors across individuals

\[ M_{kg} = \sum_{n=1}^{N} P_{kn} E_{ng} \]

First step

  • Use the NMF on a dataset of ordinal data (Likert scale)

  • Estimate latent factors

  • Compare them with those obtained through factor analysis

matrici item x fattori tramite EFA vs NMF

matrici item x fattori tramite CFA vs NMF

Second step

how to make the NMF recognize that items within the same questionnaire have a structure

Estensions

If the results are encouraging …

  • Third step: use NMF to find latent factors shared within the same spectrum of symptoms

  • Fourth step: use causal NMF to find latent factors across different treatment on the same spectrum of symptoms

Materials

All materials are available on GitHub at laurasitaunipd/nmf

Bibliography

Kotov, R., Krueger, R. F., Watson, D., Achenbach, T. M., Althoff, R. R., Bagby, R. M., … & Zimmerman, M. (2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of abnormal psychology, 126(4), 454.

Landy, J. M., Basava, N., & Parmigiani, G. (2025). bayesNMF: Fast Bayesian Poisson NMF with Automatically Learned Rank Applied to Mutational Signatures. arXiv preprint arXiv:2502.18674.

Landy, J. M., Zorzetto, D., De Vito, R., & Parmigiani, G. (2025). Causal Inference for Latent Outcomes Learned with Factor Models. arXiv preprint arXiv:2506.20549.

Supplemental Materials

EFA

risultato_fa <- factanal(item_data_complete, 
                         factors = 5, 
                         rotation = "promax", 
                         scores = "regression")

funzione pacchetto bayesNMF

result <- bayesNMF(
    data = M_t,
    likelihood = "normal",
    prior = "truncnormal",
    rank = 5
  )

MAP <- result$get_MAP()
E <- MAP$E
E <- as.matrix(E)
tE = t(E)

confronto EFA e NMF

allScores = cbind(tE,efa_scores)
round(cor(allScores),3)

corrplot(
  cor(allScores),
  method = "color",   # fill cells with colors
  type = "full",      # show full matrix
  addCoef.col = "black",   # show correlation coefficients
  number.cex = 0.7,        
  tl.cex = 0.8            
)

CFA

model = "
SMD =~ bessi_1 + bessi_6 + bessi_11 + bessi_16 + bessi_21 + bessi_26 + bessi_31 + bessi_36 + bessi_41 
IND =~ bessi_5 + bessi_10 + bessi_15 + bessi_20 + bessi_25 + bessi_30 + bessi_35 + bessi_40 + bessi_45
COD =~ bessi_3 + bessi_8 + bessi_13 + bessi_18 + bessi_23 + bessi_28 + bessi_33 + bessi_38 + bessi_43 
SED =~ bessi_2 + bessi_7 + bessi_12 + bessi_17 + bessi_22 + bessi_27 + bessi_32 + bessi_37 + bessi_42 
ESD =~ bessi_4 + bessi_9 + bessi_14 + bessi_19 + bessi_24 + bessi_29 + bessi_34 + bessi_39 + bessi_44
"
fit = cfa(model=model, data=dati, ordered=T)
summary(fit, standardized=T)
fitMeasures(fit, fit.measures=c("rmsea","srmr","cfi","nnfi"))

modificationIndices(fit, sort.=T)[1:10,]